94 research outputs found
AcListant with Continuous Learning: Speech Recognition in Air Traffic Control (EIWAC 2019)
Increasing air traffic creates many challenges for air traffic management (ATM). A general answer to these
challenges is to increase automation. However, communication between air traffic controllers (ATCos) and pilots is widely
analog and far away from digital ATM components. As communication content is important for the ATM system,
commands are still entered manually by ATCos to enable the ATM system to consider the communication. However, the
disadvantage is an additional workload for the ATCos. To avoid this additional effort, automatic speech recognition (ASR)
can automatically analyze the communication and extract the content of spoken commands. DLR together with Saarland
University invented the AcListant® system, the first assistant based speech recognition (ABSR) with both a high command
recognition rate and a low command recognition error rate. Beside the high recognition performance, AcListant® project
revealed shortcomings with respect to costly adaptations of the speech recognizer to different environments. Machine
learning algorithms for the automatic adaptation of ABSR to different airports were developed to counteract this
disadvantage within the Single European Sky ATM Research Programme (SESAR) 2020 Exploratory Research project
MALORCA. To support the standardization of speech recognition in ATM, an ontology for ATC command recognition on
semantic level was developed to enable the reuse of expensively manually transcribed ATC communication in the SESAR
Industrial Research project PJ.16-04. Finally, results and experiences are used in two further SESAR Wave-2 projects. This
paper presents the evolution of ABSR from AcListant® via MALORCA, PJ.16-04 to SESAR Wave-2 projects
Early Callsign Highlighting using Automatic Speech Recognition to Reduce Air Traffic Controller Workload
The primary task of an air traffic controller (ATCo) is to issue instructions to pilots. However, the first contact is often initiated by the pilot. It is useful
to have a controller assistance system, which could recognize and highlight the
spoken callsign as early as possible, directly from the speech data. Therefore, we
propose to use an automatic speech recognition (ASR) system to obtain the
speech-to-text translation, using which we extract the spoken callsign. As a high
callsign recognition performance is required, we use surveillance data, which significantly improves the performance. We obtain callsign recognition error rates
of 6.2% and 8.3% for ATCo and pilot utterances, respectively, but can improve
to 2.8% and 4.5%, when using information from surveillance dat
ATTENTION: TARGET AND ACTUAL – THE CONTROLLER FOCUS
The main task of an air traffic controller (ATCO) is to ensure safe and efficient air traffic control (ATC).
Therefore, the ATCO needs to have his/her attention at the right place at the right time on the controller working
position’s displays. This will be even more challenging in the future with increasing information diversity,
growing levels of automation, more complex air traffic mix, new technologies, and bigger screens. However,
to deal with these challenges an attention guiding assistance system is developed to support the ATCO. This
system needs to determine the area of target attention due to relevant upcoming ATC events. It should also
evaluate the current area of attention as a function of the ATCO's gaze, e.g., via eye-tracking, and evaluate it.
If there is a mismatch between target and actual area of attention, the attention focus of the ATCO has to be
appropriately guided to relevant areas via cues. Based on an analysis of attention and situation awareness,
attention guidance mechanisms have been developed and successfully validated in human-in-the-loop trials.
ATCOs felt well-supported by visual non-intrusive guidance cues and even wanted to have such functionality
in today’s working positions
Automatic Speech Analysis Framework for ATC Communication in HAAWAII
Over the past years, several SESAR funded exploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic architecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively
Brain–Computer Interface-Based Adaptive Automation to Prevent Out-Of-The-Loop Phenomenon in Air Traffic Controllers Dealing With Highly Automated Systems
International audienceIncreasing the level of automation in air traffic management is seen as a measure to increase the performance of the service to satisfy the predicted future demand. This is expected to result in new roles for the human operator: he will mainly monitor highly automated systems and seldom intervene. Therefore, air traffic controllers (ATCos) would often work in a supervisory or control mode rather than in a direct operating mode. However, it has been demonstrated how human operators in such a role are affected by human performance issues, known as Out-Of-The-Loop (OOTL) phenomenon, consisting in lack of attention, loss of situational awareness and de-skilling. A countermeasure to this phenomenon has been identified in the adaptive automation (AA), i.e., a system able to allocate the operative tasks to the machine or to the operator depending on their needs. In this context, psychophysiological measures have been highlighted as powerful tool to provide a reliable, unobtrusive and real-time assessment of the ATCo’s mental state to be used as control logic for AA-based systems. In this paper, it is presented the so-called “Vigilance and Attention Controller”, a system based on electroencephalography (EEG) and eye-tracking (ET) techniques, aimed to assess in real time the vigilance level of an ATCo dealing with a highly automated human–machine interface and to use this measure to adapt the level of automation of the interface itself. The system has been tested on 14 professional ATCos performing two highly realistic scenarios, one with the system disabled and one with the system enabled. The results confirmed that (i) long high automated tasks induce vigilance decreasing and OOTL-related phenomena; (ii) EEG measures are sensitive to these kinds of mental impairments; and (iii) AA was able to counteract this negative effect by keeping the ATCo more involved within the operative task. The results were confirmed by EEG and ET measures as well as by performance and subjective ones, providing a clear example of potential applications and related benefits of AA
Customization of Automatic Speech Recognition Engines for Rare Word Detection Without Costly Model Re-Training
Thanks to Alexa, Siri or Google Assistant automatic speech recognition (ASR) has changed our daily life during the last decade. Prototypic applications in the air traffic management (ATM) domain are available. Recently pre-filling radar label entries by ASR support has reached the technology readiness level before industrialization (TRL6). However, seldom spoken and airspace related words relevant in the ATM context remain a challenge for sophisticated applications. Open-source ASR toolkits or large pre-trained models for experts - allowing to tailor ASR to new domains - can be exploited with a typical constraint on availability of certain amount of domain specific training data, i.e., typically transcribed speech for adapting acoustic and/or language models. In general, it is sufficient for a "universal" ASR engine to reliably recognize a few hundred words that form the vocabulary of the voice communications between air traffic controllers and pilots. However, for each airport some hundred dependent words that are seldom spoken need to be integrated. These challenging word entities comprise special airline designators and waypoint names like "dexon" or "burok", which only appear in a specific region. When used, they are highly informative and thus require high recognition accuracies. Allowing plug and play customization with a minimum expert manipulation assumes that no additional training is required, i.e., fine-tuning the universal ASR. This paper presents an innovative approach to automatically integrate new specific word entities to the universal ASR system. The recognition rate of these region-specific word entities with respect to the universal ASR increases by a factor of 6
Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment
This paper describes the safety assessment conducted in SESAR2020 project PJ.10-W2-96 ASR on automatic speech recognition (ASR) technology implemented for air traffic control (ATC) centers. ASR already now enables the automatic recognition of aircraft callsigns and various ATC commands including command types based on controller–pilot voice communications for presentation at the controller working position. The presented safety assessment process consists of defining design requirements for ASR technology application in normal, abnormal, and degraded modes of ATC operations. A total of eight functional hazards were identified based on the analysis of four use cases. The safety assessment was supported by top-down and bottom-up modelling and analysis of the causes of hazards to derive system design requirements for the purposes of mitigating the hazards. Assessment of achieving the specified design requirements was supported by evidence generated from two real-time simulations with pre-industrial ASR prototypes in approach and en-route operational environments. The simulations, focusing especially on the safety aspects of ASR application, also validated the hypotheses that ASR reduces controllers’ workload and increases situational awareness. The missing validation element, i.e., an analysis of the safety effects of ASR in ATC, is the focus of this paper. As a result of the safety assessment activities, mitigations were derived for each hazard, demonstrating that the use of ASR does not increase safety risks and is, therefore, ready for industrialization
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition
Automatic Speech Recognition (ASR) for air traffic
control is generally trained by pooling Air Traffic Controller
(ATCO) and pilot data. In practice, this is motivated by the
proportion of annotated data from pilots being less than ATCO’s.
However, due to the data imbalance of ATCO and pilot and
their varying acoustic conditions, the ASR performance is usually
significantly better for ATCOs speech than pilots. Obtaining the
speaker roles requires manual effort when the voice recordings
are collected using Very High Frequency (VHF) receivers and
the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the
ATCO and pilot data using an intuitive approach exploiting
ASR transcripts and (2) consider ATCO and pilot ASR as two
separate tasks for Acoustic Model (AM) training. The paper
focuses on applying this approach to noisy data collected using
VHF receivers, as this data is helpful for training despite its
noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar
defined by the International Civil Aviation Organization (ICAO).
Our system accepts as input text, thus, either gold annotations
or transcripts generated by an ABSR system. This approach
provides an average accuracy in speaker role identification of
83%. Finally, we show that training AMs separately for each
task, or using a multitask approach, is well suited for the noisy
data compared to the traditional ASR system, where all data is
pooled together for AM training
How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth
Applying Automatic Speech Recognition (ASR) in the domain
of analogue voice communication between air traffic
controllers (ATCo) and pilots has more end user requirements
than just transforming spoken words into text. It is useless,
when word recognition is perfect, as long as the semantic
interpretation is wrong. For an ATCo it is of no importance if
the words of greeting are correctly recognized. A wrong
recognition of a greeting should, however, not disturb the
correct recognition of e.g. a “descend” command. Recently, 14
European partners from Air Traffic Management (ATM)
domain have agreed on a common set of rules, i.e., an ontology
on how to annotate the speech utterance of an ATCo. This paper
first extends the ontology to pilot utterances and then compares
different ASR implementations on semantic level by
introducing command recognition, command recognition error,
and command rejection rates. The implementation used in this
paper achieves a command recognition rate better than 94% for
Prague Approach, even when WER is above 2.5
- …